24 research outputs found
Decompositional Semantics for Events, Participants, and Scripts in Text
This thesis presents a sequence of practical and conceptual developments in decompositional meaning representations for events, participants, and scripts in text under the framework of Universal Decompositional Semantics (UDS) (White et al., 2016a). Part I of the thesis focuses on the semantic representation of individual events and their participants. Chapter 3 examines the feasibility of deriving semantic representations of events from dependency syntax; we demonstrate that predicate- argument structure may be extracted from syntax, but other desirable semantic attributes are not directly discernible. Accordingly, we present in Chapters 4 and 5 state of the art models for predicting these semantic attributes from text. Chapter 4 presents a model for predicting semantic proto-role labels (SPRL), attributes of participants in events based on Dowty’s seminal theory of thematic proto-roles (Dowty, 1991). In Chapter 5 we present a model of event factuality prediction (EFP), the task of determining whether an event mentioned in text happened (according to the meaning of the text). Both chapters include extensive experiments on multi-task learning for improving performance on each semantic prediction task. Taken together, Chapters 3, 4, and 5 represent the development of individual components of a UDS parsing pipeline.
In Part II of the thesis, we shift to modeling sequences of events, or scripts (Schank and Abelson, 1977). Chapter 7 presents a case study in script induction using a collection of restaurant narratives from an online blog to learn the canonical “Restaurant Script.” In Chapter 8, we introduce a simple discriminative neural model for script induction based on narrative chains (Chambers and Jurafsky, 2008) that outperforms prior methods. Because much existing work on narrative chains employs semantically impoverished representations of events, Chapter 9 draws on the contributions of Part I to learn narrative chains with semantically rich, decompositional event representations. Finally, in Chapter 10, we observe that corpus based approaches to script induction resemble the task of language modeling. We explore the broader question of the relationship between language modeling and acquisition of common-sense knowledge, and introduce an approach that combines language modeling and light human supervision to construct datasets for common-sense inference
Hypothesis Only Baselines in Natural Language Inference
We propose a hypothesis only baseline for diagnosing Natural Language
Inference (NLI). Especially when an NLI dataset assumes inference is occurring
based purely on the relationship between a context and a hypothesis, it follows
that assessing entailment relations while ignoring the provided context is a
degenerate solution. Yet, through experiments on ten distinct NLI datasets, we
find that this approach, which we refer to as a hypothesis-only model, is able
to significantly outperform a majority class baseline across a number of NLI
datasets. Our analysis suggests that statistical irregularities may allow a
model to perform NLI in some datasets beyond what should be achievable without
access to the context.Comment: Accepted at *SEM 2018 as long paper. 12 page
It's Not Easy Being Wrong: Evaluating Process of Elimination Reasoning in Large Language Models
Chain-of-thought (COT) prompting can help large language models (LLMs) reason
toward correct answers, but its efficacy in reasoning toward incorrect answers
is unexplored. This strategy of process of elimination (PoE), when used with
COT, has the potential to enhance interpretability in tasks like medical
diagnoses of exclusion. Thus, we propose PoE with COT, a new task where LLMs
must reason toward incorrect options on multiple-choice questions. We evaluate
the ability of GPT-3.5, LLaMA-2, and Falcon to perform PoE with COT on 2-choice
commonsense and scientific reasoning datasets. We show that PoE consistently
underperforms directly choosing the correct answer. The agreement of these
strategies is also lower than the self-consistency of each strategy. To study
these issues further, we conduct an error analysis and give suggestions for
future work.Comment: In progress preprin
SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models
A common limitation of diagnostic tests for detecting social biases in NLP
models is that they may only detect stereotypic associations that are
pre-specified by the designer of the test. Since enumerating all possible
problematic associations is infeasible, it is likely these tests fail to detect
biases that are present in a model but not pre-specified by the designer. To
address this limitation, we propose SODAPOP (SOcial bias Discovery from Answers
about PeOPle) in social commonsense question-answering. Our pipeline generates
modified instances from the Social IQa dataset (Sap et al., 2019) by (1)
substituting names associated with different demographic groups, and (2)
generating many distractor answers from a masked language model. By using a
social commonsense model to score the generated distractors, we are able to
uncover the model's stereotypic associations between demographic groups and an
open set of words. We also test SODAPOP on debiased models and show the
limitations of multiple state-of-the-art debiasing algorithms.Comment: EACL 202
What to Read in a Contract? Party-Specific Summarization of Legal Obligations, Entitlements, and Prohibitions
Reviewing and comprehending key obligations, entitlements, and prohibitions
in legal contracts can be a tedious task due to their length and
domain-specificity. Furthermore, the key rights and duties requiring review
vary for each contracting party. In this work, we propose a new task of
party-specific extractive summarization for legal contracts to facilitate
faster reviewing and improved comprehension of rights and duties. To facilitate
this, we curate a dataset comprising of party-specific pairwise importance
comparisons annotated by legal experts, covering ~293K sentence pairs that
include obligations, entitlements, and prohibitions extracted from lease
agreements. Using this dataset, we train a pairwise importance ranker and
propose a pipeline-based extractive summarization system that generates a
party-specific contract summary. We establish the need for incorporating
domain-specific notion of importance during summarization by comparing our
system against various baselines using both automatic and human evaluation
methodsComment: EMNLP 202
Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation
We present a large-scale collection of diverse natural language inference
(NLI) datasets that help provide insight into how well a sentence
representation captures distinct types of reasoning. The collection results
from recasting 13 existing datasets from 7 semantic phenomena into a common NLI
structure, resulting in over half a million labeled context-hypothesis pairs in
total. We refer to our collection as the DNC: Diverse Natural Language
Inference Collection. The DNC is available online at https://www.decomp.net,
and will grow over time as additional resources are recast and added from novel
sources.Comment: To be presented at EMNLP 2018. 15 page